Step 4 (partial): DDL hardening — V-L2-G1/H1/H2/I1/J1/K1#67
Merged
Conversation
…rov fix
Step 4 partial. Lands the mechanical DDL-correctness work (V-L2-G1,
H1, H2, I1, J1, K1). The bigger architectural items in Step 4 stay
filed (V-L2-A1 sqlparser replacement, V-L2-B2 composite-id hashing,
V-L2-F1 dialect split) — each needs a dedicated session.
V-L2-G1 — identifier validation:
Added `validate_identifier` / `must_validate_identifier` to
overlay.rs accepting only `^[A-Za-z_][A-Za-z0-9_]*$`. Every
user-controlled identifier flowing into `INSERT OR IGNORE INTO
verisimdb_metadata VALUES ('{}', ...)` is now validated at
codegen time, so a table named `posts'); DROP TABLE x;--` is
rejected with a structured error instead of injected. Two new
test sets cover 5 safe names and 10 attack strings.
V-L2-K1 — provenance latest-per-entity view fixed:
The previous greatest-N-per-group subquery had a broken
correlation (inner MAX subquery referenced the outer
uncorrelated row rather than the alias). Replaced with the
canonical ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY
timestamp DESC) = 1 pattern, which works on SQLite 3.25+ and
PostgreSQL. The integration test for the view now asserts the
new pattern and the absence of the old broken correlation.
V-L2-H1 + V-L2-H2 — temporal exactness:
- CREATE UNIQUE INDEX (was non-unique partial); enforces exactly
one current row per (entity, table) at DB level instead of
relying on application-layer discipline.
- CHECK valid_to IS NULL OR valid_to >= valid_from.
- CHECK version >= 1.
V-L2-I1 — lineage self-edges forbidden:
CHECK NOT (source_entity = target_entity AND source_table =
target_table). Cycle prevention beyond self-edges is V-L1-G1
(runtime concern, separate ADR).
V-L2-J1 — closed-set CHECKs and the missing FK:
- provenance_log.operation ∈ {insert,update,delete,transform}
- lineage_graph.derivation_type ∈ {copy,transform,aggregate,join,filter}
- temporal_versions.operation ∈ {insert,update,rollback}
- access_policies.access_level ∈ {read,write,admin,deny}
- access_policies.active ∈ {0,1}
- simulation_branches.status ∈ {active,merged,abandoned}
- simulation_deltas.operation ∈ {insert,update,delete}
- simulation_branches.parent_branch REFERENCES
simulation_branches.branch_id (self-FK; was declared but
un-enforced).
DDL tests added for every constraint above (7 new test functions).
Verified locally:
- cargo fmt --all -- --check clean
- cargo clippy --all-targets -- -D warnings clean
- cargo test reports 49 lib + 9 integration = 58 tests, 0 failed
(was 42 + 9 = 51; +7 codegen tests)
Closes #39, #40, #41, #42, #43
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Step 4 partial — DDL hardening. Lands the mechanical correctness fixes (V-L2-G1, H1, H2, I1, J1, K1). The larger architectural items in Step 4 stay filed and unresolved (V-L2-A1 sqlparser replacement, V-L2-B2 composite-id hashing, V-L2-F1 dialect split) — each needs a dedicated session.
Stacked on #37 (Step 3). Merge order: #24 → #33 → #37 → this PR.
What changes
validate_identifier+must_validate_identifierguard every user-controlled name flowing into generated DDL. Table namedposts'); DROP TABLE x;--is now rejected at codegen time. 10-attack rejection test. Closes V-L2-G1: validate + escape every user-controlled identifier in generated DDL #39.ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY timestamp DESC). Old self-correlation bug is gone. Closes V-L2-K1: provenance latest-per-entity view has broken correlation #40.verisimdb_temporal_versionsgets a partial UNIQUE INDEX (WHERE valid_to IS NULL), CHECKvalid_to IS NULL OR valid_to >= valid_from, CHECKversion >= 1. Closes V-L2-H1: temporal versions need UNIQUE partial index + valid_to CHECK #41.simulation_branches.parent_branch. Closes V-L2-J1: simulation parent_branch FK missing + enum-shape CHECKs missing across overlay #43.7 new DDL tests assert each constraint is present.
What stays open
sqlparsercrate. Substantial dep + behaviour change.'::'-separator collision. Needs a design choice (hash vs unit-separator).These are real Step 4 work, just not in scope here.
Test plan
cargo fmt --all -- --checkcleancargo clippy --all-targets -- -D warningscleancargo testreports 49 lib + 9 integration = 58 tests, 0 failed (was 42 + 9 = 51)🤖 Generated with Claude Code